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Dealing with Uncertainty about Item Parameters: 
Expected Response Functions 



Abstract 

It is a common practice in item response theory (IRT) to treat estimates of 
item parameters, say B, as if they were the known, true quantities, B. 
However, ignoring the uncertainty associated with item parameters can lead 
to biases and over-confidence in subsequent inferences such as ability 
estimation, especially when item-calibration samples are small. This paper 
demonstrates how to incorporate uncertainty about B with Lewis’s 
“expected response functions” (ERFs), pointwise expected valrjs of item 
response conditional on examinee proficiency averaged over posterior 
distributions of item parameters. This paper presents ERFs, outlines 
procedures for computing them and using them in practical work, and gives 
an illustration with data from the National Assessment of Educational 
Progress. Advantages of approximating ERFs response curves with 
members of familiar parametric families of IRT curves are noted. 

Key words: Bayesian estimation, expected response functions, item 

response theory, multiple imputation, pseudolikelihood 
estimation 



Introduction 



Item response theory (IRT) models posit that an examinee’s chances of correctly 
answering test items depend on an unobservable parameter for that examinee (6) and for 
each of the items (fij, for ;=1,. It is common to estimate the item parameters from the 

response of a “calibration sample” of examinees, then treat the estimates B = j as 

if they were tme parameter values in subsequent inferences such as estimating examinees’ 
proficiency parameters. Tsutakawa and Johnson (1990) found that ignoring uncertainty 
about 3-parameter logistic (3PL) item parameters from a calibration sample of 4(X) led to 
biased posterior means for 6s and understatement of posterior standard deviations by more 

than 40-percent on the average. 

Approaches that take uncertainty about B into account include a second-order 
Taylor series expansion with an asymptotic normal approximation for p(B) (Tsutakawa & 
Soltys, 1988; Tsutakawa & Johnson, 1990), numerical integration over a normal 
approximation (Jones, Wainer, & Kaplan, 1984), multiple imputation (Mislevy & Yan, 
1991), and Gibbs sampling (Albert, 1992). This paper presents approximations based on 
Lewis’s (1985) notion of “expected response functions” (ERFs), pointwise expected 
values of item response conditional on 0 as averaged over posterior distributions of item 
parameters. (See Mislevy, Sheehan, & Wingersky, 1993, on the use of ERFs in IRT test 
equating when information about item parameters is lumted.) 

The following section describes the problem and reviews previous solutions. ERFs 
and computing approximations are then given. Their use is illustrated with data from the 
National Assessment of Educational Progress. 

Background and Notation 

Item Response Theory 

This paper confines discusssion to scalar parametric IRT models for dichotomous 
(right/wrong) test items, but the ideas can be extended to more complex models. Define 
Fj{6)y the item response function for Itemy, as follows: 

Fj{d) = PToh(Xj = \\d,pj), 




I < 

/ 



( 1 ) 
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where Xj is the response to Item j, 1 for right and 0 for wrong, $ is the examinee 
proficiency parameter, and Pj is the (possibly vector-valued) parameter for Item j. For 
example, under the 3-parameter logistic (3PL) model, 

where 'P is the logistic distribution 'P(r)=[l+exp(-z)]*^ and ^={aj,bj,Cj) (Lord, 1980). 

The density p{xj\9,pj^ is thus F^(0) ifjc^l and 1-Fy(0) ifxj=0. Under the usual IRT 

assumption of conditional independence, the probability of a vector of responses 
jf=(xi,...,X/i) to n items is the product over items of terms based on (1): 

p(Arl0,B) = J7p(jCyl0,i3;) 

C (2) 

= YlFj{er[l-Fj{6)p. 

;=i 



Equation 2 is the basis for estimating an examinee’s 6. Suppose jc and B were 
known. For maximum likelihood estimation, one finds the value of ©that maximizes (2), 
namely, the MLE 0. The asymptotic variance of the MLE is the inverse of the Fisher 
information function, which is a sum of contributions over items: 



Var''(©l©,B) » 5^ 

J 



F,(9)[l-Fj(9)]' 



(3) 



For Bayesian inference, if p(6) represents prior knowledge about an examinee’s 
proficiency before jc is observed, then knowledge posterior to the observation is obtained 
by Bayes theorem as 



nffll r = p(.ylg,B) p(g) 

Jp(AfI©,B)p(©)©© 



(4) 



The posterior mean and variance are, respectively, 

E(©lj»:,B) = J©p(©ljc,B) dd 



( 5 ) 
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Var(0lx,B) = p(0lx,B) de-[\e p(0lx,B) de^. (6) 

Uncertainty About Item Parameters 

Equations 2 through 6 are written as conditional on B. It is common to evaluate 
such expressions using a point estimate of B, or B, as obtained for example from the 
responses Xcalib = of a calibration sample of iV examinees. Forexanq)le, the 

Bayes modal estimate of B when p(^ is known maximizes the posterior distribution for B, 

p(Bix^)= p(x=.iB)p(B) "= nj p(x,ie.B)p(e)a9p(B). (t) 

1*1 

where p(B) expresses prior knowledge about B (e.g., Mislevy, 1986, Tsutakawa, 

1984) — ^periiaps uninformative, perhaps based on items’ content or skill requirements, 
expert judgments, or experience with similar items (Mislevy, Sheehan, & Wingersky, 
1993). In large samples, the posterior distribution can be approximated by a multivariate 

A 

normal distribution with mean B and variance 



'a>gp(Bij:^)] 




^aB' 





Values B and Sg for an approximation could be obtained, for example, as maximum 
likelihood or Bayesian modal estimates and asymptotic covariance matrix from Mislevy & 
Bock’s (1983) BILOG program, as illustrated in the NAEP example below. In the sequel, 
we simply use p(B) to stand for knowledge about B at a given point in time, regardless of 
its source. Note that p(B) need not incorporate independence over items. 

A 

As Tsutakawa et al. demonstrate, ignoring the uncertainty about B (by treating B as 
B) can lead to biases and understated uncertainties in subsequent inferences about ©5. 
Incorporating this kind of uncertainty into analyses is straightforward from a Bayesian 
perspective: Marginalize with respect to partially-known quantities. For example, the so- 
called **marginal likelihood function” takes uncertainty about B into account in the 
likelihood function by integrating (2) with respect to p(B): 
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p(jcl0) = E„[p(xl0.B)] 

= Jp(xl0.B)p(B)aB 

;=i 

= jnF,(e)‘'[i-F;(e)r'' p(B)aB, 

;=i 

effectively the average of (2) over all possible values of B, each weighted by its probability 
given the information from the calibration sample. More generally, if G(B) is any 
expression involving item parameters, then 

E,[G(B)] = JO(B)p(B)aB. (10) 



Alternative Approaches 

Closed-form solutions of (10) are not generally forthcoming in IRT. Before 
introducing expected response functions, we briefly review three alternatives: a second- 
order analytic approximation, multiple imputation, and Gibbs sampling. The discussion of 
multiple imputation is more detailed, because the ERF approximation shares intermediary 
steps with multiple imputation and the NAEP example compares numerical results from the 
two approaches. 

Tsutakawa’s second-order expansion uses an approximation due to Lindley (1980): 

E,[G(B)].0(B)+i20A. (II) 

where Grs is the element of ^*[G(B)]/^<9B' and Irs is the r,5* element of 2 b. with 
r and s indexing elements of B. When calculating an examinee’s posterior mean (5), for 
example, G(B) is J 0 p(01JC,B) dd. Because such approximations would be exact if p(B) 

were MVN(B,2 b), their performance in (10) depends on the accuracy of the asymptotic 
normal approximation to p(B) — which is often satisfactory in practice since even the usual 
first-order approximation G( B ) suffices when the calibration sample is large and p(BW0 

A 

is concentrated around B . An impediment to using (11) in practical work is that 
derivatives must be calculated for each function G to which it is applied. 



Expected response functions 
Page 5 

Albert (1992) employed Gibbs sampling (Geifand & Smith, 1990) to obtain a 
discrete approximation to the joint posterior distribution of B and the vector of examinee 
abilities Q under the 2-parameter normal (2PN) IRT model. From vectors B(0 and 6(9 
that approximate B and 6 , one obtains a subsequent approximation by drawing B(t+0 
from p^B16 = 6^'\x), then drawing6(^''^^) from p( 61B = B^'^'^AT). From initial 

approximations, repeated cycles achieve (under regularity conditions) a stochastic 
convergence such that a (63) draw obtained in this manner is essentially a draw from the 
correct posterior p(6,Bl X). Widely spaced draws from a sequence which has attained 

convergence (or, better still, from separate sequences initiated from different starting 
points; see Gelman & Rubin, 1992) are essentially independent draws from p(6,BlX). 

Evaluating any function G(6,B) of the parameters with respect to each of these draws 
constitutes a discrete approximation of its posterior distribution. (This last idea will be 
illustrated below with multiple imputation.) In particular, the discrete approximation of 
p(B) can serve as a basis for calculating expected response functions. Gibbs sampling is 
much more computationally intensive than the other approximations described in this paper. 

Multiple imputation, introduced by Rubin (1987) to handle missing responses in 
sample surveys, creates pseudo datasets with draws from the posterior distributions of 
missing data, and combines the results of standard analyses of pseudo data sets so as to 
incorporate the uncertainty that missingness engenders. B plays the role of missing data in 
the problem of imperfect knowledge about item parameters (Mislevy & Yan, 1991). 
Suppose that if B were known, we could calculate the posterior mean and variance of 
G(B), say, G(B) and V(B). An example again would be the posterior mean and variance 
for an examinee’s 0via (5) and (6). The steps for multiple-imputation approximations of 
the posterior mean and variance that take uncertainty about B into account, say, G and V , 
are outlined below. The reader is referred to Rubin (1987) for theoretical justification. 

1 . Obtain the posterior distribution for B, p(B) (e.g., the multivariate normal 
approximation MVN(B ,2b) used in the following NAEP example). 

2 . Draw K item parameter vectors from p(B), say Bik for 1 , . . . JK. 

3 . For each k, calculate the posterior mean and variance conditional on B=Bik, denoted 
G(B.) and V(B,). 

4 . The posterior mean for G, accounting for uncertainty about B, is approximated by 
the average of the K conditional posterior means: 
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G = jr'Xo(B.> 



( 12 ) 



5 . The posterior variance for G, accounting for uncertainty about B, is approximated 

by the sum of two terms: 



approximates the variance that would exist even if B were known with certainty, 
and the second, 



quantifies additional uncertainty due to not knowing B. 

Example: Data from NAEP 

We shall use a running example with data from the National Assessment of 
Educational Progress (NAEP): responses to 19 items from 100 8- and 13-year old students 
who participated in the 1986 and 1988 mathematics trend assessment. Table 1 gives 
descriptive statistics and Bayesian posterior modal estimates B - {a,b,c^ obtained with 

Mislevy and Bock’s (1983) BDLOG computer program. Table 2 gives the accompanying 
approximation of the posterior covariance matrix Z, . Covariances among the three 
parameters for the same item can be quite high, but relationships among parameters for 
different items are uniformly much lower. 



A practical problem in applying multiple imputations is to determine the value of K 
that provides the desired accuracy, which may differ with the target G. In the NAEP 
example, Mislevy and Yan (1991) calculated examinees’ posterior means and variances 
with /(=10, 100, and 1000. /if=10 proved stable for estimating posterior means, but not 
for posterior variances, which were stable with Ar=100. Results for i^=100 and /ir=1000 
were indistinguishable. We use the /ir=100 results below as a baseline comparison for 




(13) 



where the first, 






V = {iir-1)-'X[G(B,)-Gf 



[[Tables 1 & 2 about here]] 
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corresponding estimates calculated with ERFs. The dotted lines in Figure 1 illustrate the 
item response functions for four items from the NAEP example that correspond to 100 
draws of B. (The solid and dashed lines will be discussed below). These graphs depict 
the nature and magnitude of uncertainty about item response functions, but not the mild co- 
relationship among the curves induced by the nonzero inter-item c ovariances. 

[[Figure 1 about here]] 

Expected Response Functions 

Definition 

In dichotomous IRT models, the expected value of a correct response to Itemy 
given 0 and B is Fj{d)=P{Xj-l\6,pj). If is only partially known, through p(B), the 
probability of a correct response conditional on 6 but marginal with respect to B can be 
written as 

F-(e) = EjjF,(9)] 

= Jp(x, = lie,/3,.)p(B)aB (14) 

= Jp(x, = iiftft)p(ft)a^,, 

an “expected response function” that gives the probability of correct response conditional 
on 0 taking into account uncertainty about B (Lewis, 1985). 

Even though F* is the expected value of a correct response at each value of 6, it is 
not the same as Fji6) evaluated with the expected value of pj. This can be seen in Figure 
1 , which shows expected response functions (dashed lines) for the four items from the 
NAEP example, along with the curves that correspond to as evaluated with the point 
estimate pj (solid lines). In particular, the ERF is generally flatter. 

The shape of F* depends on the shape of Fy and the character of p(j8y). In general, 
F* and Fy will not be of the same functional form. Lewis (1985) shows that if Fy were 
2PN and p(j3y)sp(ay,Ay) were bivariate normal, then F* would be a 2-parameter ogive with 
a Student’s t shape. Its location parameter, bj, would have the same value as the Bayes 
mean estimate for bj, or bj , but its slope parameter, a* , would be attenuated from the 




-A. kJ 
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Bayes mean estimate for aj. A simpler result is obtained if aj is known with certainty a 
priori. If pQ>j) is then F* is also 2PN, with b*-bj and 

a] = (af + of )■“. 



Approximation with ERFs 

ERFs serve as a potential basis for taking uncertainty about B into account, by 
replacing occurrences of Fys with F*s in functions of interest G(B). As examples, 

consider the following: 

Likelihood estimation of 0 proceeds by maximizing an ERF-based analogue of the 
likelihood, namely 

p-{iie)=nF-{e)''[i-F;{e)p, 

'■> (15) 

One way to justify maximizing p*(jil 6) is to view it as an approximation of the marginal 
likelihood: 

p{A:ie)=E,[p(jcie,B)] 

= jjTF/F)''[l-F,{9)pp(B)aB 

y=i 

y=l 

=riF;(er'[i-F;{9)p' 

/*i 

= p-(xl0). 

The step in which the approximation occurs replaces each p(/3;li3y_p. . .,A) 
p(A)- Thus, if the information about items is independent — that is, p(B)=n p03y) — 
result is exact. Likelihood and Bayesian inferences about dthat take uncertainty about B 




14 
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into account exhibit in this case the same conditional independence form as when item 
parameters are known. In particular, applying standard procedures for known item 
response functions to obtain MLEs and asymptotic variances (3), but with F* s in place of 

FyS, gives the correct results. Independent posteriors for items can be assured or closely 
approximated by coupling special item-calibration sampling designs and test constraction 
designs; the idea is for the items appearing in a test, the sets of examinees in the calibration 
sample responding to each of them were completely or nearly disjoint. For example, 
randomly equivalent calibration samples of examinees can be administered disjoint blocks 
of items, and operational test forms can be built with items from different blocks. 



A second justification applies even if p(B) is not independent over items. Although 
the dependencies among items are ignored, (15) is an example of what Arnold and Strauss 
(1991) call a “pseudo-likelihood” (see Appendix); under regularity conditions on the F*s, 

its maximiim is a consistent estimator of 6. Thus likelihood point estimates of 6 based on 

(15) tend to have the correa central tendency. Applying the standard MLE variance 
formula (3) with F* s tends to give too optimistic of an impression of the uncertainty about 

0s, however. But if the dependencies among items are small — and they tend toward zero 
in long tests (Mislevy & Sheehan, 1989) — the degree to which this value understates 
uncertainty will also be small. 

Bayesian inference about 6 can employ the above approximation p*(xl0) for 
likelihoods. The posterior distribution for 0is thus approximated as 



p{6\x) 



p*(xl0) p(0) 
Jp*(xl0)p(0)(90 



and the posterior mean and variance are approximated as 

E(0lx) = Jj0p(0IJC,B) dddB 

»j 0 p*( 0 lx) de 



(16) 



and 



Var(0lx) = Jj0" p(0lx,B) dd-[\d p(0IJC,B) 00pB 
= j0'p‘(0lx)00-[j0p*(01x)00]'. 



( 17 ) 
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Again the approximations are exact if p(B) is independent over items, and indicators of 
uncertainty tend to be optimistic to the extent that dependencies among items are 
nonnegligible. Some numerical results on this point appear in the NAEP example. 

The test characteristic function is the expected number-correct score on a test of n 
items as a function of B. Mislevy, Sheehan, & Wingersky (1993) obtained test 
characteristic functions with ERFs, in order to equate tests with sparse item-calibration 
data. IRT true-score equating determines number-right (or formula) scores on different 
tests that correspond to the same values of 6 (Lord, 1980). The expected number-right 
score on Test A for an examinee with proficiency 6 is obtained as 

2p(^=i'®4)= SF/W’ 

/eT* 

where Ta is the set of indices of items that appear in Test A. The expected score on Test 
B, Tg(d), is defined analogously. A score on Test A and a score on Test B are “true-score 

equated” if they are the respective expected scores of the same value of 6. 

When knowledge about B is imperfect, one must equate scores that are expectations 
conditional on 6 but marginal with respect to p(B), rather than expected scores conditional 
on 9 and B. The expected true score on Test A given 9 under these circumstances is thus 

<(e)=E,[i:je)]= = (19) 

y €T A y €T A 

This is simply the sum of the probabilities of correct response item by item, whether or not 
p(B) is independent over items. A score on Test A and a score on Test B are "'expected 
true-score equated” if they are the respective expected scores of the same value of 9, as 
defined by (19). Because only expected scores are needed for this equating method, the 
expected test characteristic curves obtained in (19) are correct whether or not the posteriors 
for individual items are independent. 

Computing Approximations 

As noted above, closed-form solutions for F* are not generally available. This 

section describes how to use multiple-imputations or Gibbs-sampler discrete estimates of 
p(J^) to estimate F* point by point across a grid of lvalues foi each item. Because only 

p(/^) is involved for Item j, not the posteriors for other items, this process can be carried 
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out independently over items. Subsequent inferences about 9 can be drawn using these 
points in a discrete approximation of the 0 distribution and the response curve, or a smooth 
curve can be fit to the probabilities thus obtained. 

There are operational advantages to using the closest curve from a familiar family to 
approximate F* — ^for example, the closest 3PL curve in applications based on the 3PL 

model, or the closest 2PL model in applications based on the IPL or 2PL. Let F** denote 

such an approximation. This expedient makes it possible to use standard off-the-shelf 
software designed for popular parametric IRT models to estimate examinee scores, 
construct tests, or draw equating lines. If additional information about item parameters 
becomes available over time, as might occur as examinee responses are acquired over time 
in operational testing, it can be incorporated into the system by merely updating item 
parameter values under the same model. If the IRT model were correct and the response 
function wctc stable over time, the sequence of expected response curves would converge 
tov/ard the closest member of the family to the true curve — to the true curve itself, if it were 
a member of the family. 

We now describe the operational procedures we have used for applied work with 
ERFs. The expected response function for a particular item, F*, is approximated as 

follows: 

1 . Obtain an estimate of the posterior distribution p( j3p. As noted above, this is 

usually based on a calibration sample of examinee responses — say, 

with parameter estimates from BILOG— but it may also be based partly or wholly 
on collateral information about items such as content specifications and cognitive 
processing requirements (Mislevy, Sheehan, & Wingersky, 1993). 

2 . Specify a grid of M theta values across the ability range of interest. Let denote 
the m* grid point. 

3. Draw K item parameter vectors from p(/Jp. Let be the such draw. 

4 . For each of the K sets of item parameters, determine , the probability of a 
correct response to Itemy at ©„, where Py*^ = p[xj = 119 = 
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5 . Compute the expectation at each point by averaging the probabilities obtained in 



We refer to the collection of points {(©„, F*(©„)): m=\,...M) as the 

"nonparametric" expected response function because it does not assume any particular 
parametric form. 

For applied work, it may be convenient to approximate the nonparametric ERF with 
a continuous approximation F**, say a spline or a close-fitting 2PL or 3PL curve. The use 

of a 3PL will be illustrated below. Maximum likelihood estimates for the 3PL item 
parameters j8‘* = (fl**,f>**,c**) that best approximate F* are found by maximizing 



over the M-point thcia grid, where Wm is a weight that specifies the relative importance of 
fitting F** at ©„. For example, weights may be selected to simulate a rectangular 

distribution of examinees or a normal distribution of examinees. The maximum may be 
obtained iteratively by using Newton’s method to obtain successive corrections to the 
parameter estimates. We refer to the solution as a “fitted” expected response function. 

Example (continued) 

The BILCX3 calibration of the 19 previously-described NAEP items with 100 
examinees provided the posterior mode estimates {aj,bj,c^ and the corresponding large- 

sample approximation of the covariance matrix discussed above. Due to range restrictions 
on the a’s and c’s, we worked with a multivariate normal (MVN) approximation for the 
posterior of pj = (log(ay),by,logit(Cy)), where logit(Cy)=log[Cy/(l-Cy)]. p(j8y) was thus 

^ A 

approximated as MVN with mean vector Pj = (log(ay),f>y,logit(Cy)) and covariance matrix 
obtained through the delta method from the covariance matrix for the untransformed 

parameters. Nonparametric and fitted 3PL ERFs were calculated for each item. Figure 2 
presents results for the four items which previously appeared in Figure 1. The 
nonparametric ERFs were obtained using 100 draws ftom p( j3y) and a grid of 31 evenly- 

spaced © values ranging from -3 to +3 in steps of .2. The fitted curves employed a 



Step 4; 





( 20 ) 



Expected response functions 
Page 13 



standard normal weighting function over the same range. The item response functions that 
correspond to Fj(6) evaluated with the point estimate are also plotted for comparison. 

These curves are noticeably steeper than the two expected response curves. Thus, one 
effect of ignoring uncertainty about item parameters is a tendency to inflate belief about the 
discriminating power of an item. 



[[Figure 2 about here]] 

For most of the 19 items, the 3PL approximation captured the nonparametric 
approximation quite well. The only disaepancies encountered were for items with fairly 
high a’s, such as Item 19. For these highly discrinainating items, the fitted curves tended to 
be slightly flatter than the nonparametric curves. The discrepancies were slightly more 
pronounced when the ERFs were recalculated with a rectangular weighting function, 
indicating that they are related to the inability of the 3PL form to capture the pattern of 
curvature in the tails of the theta range. 

Figure 3 presents a comparison of results regarding Bayesian inference about Ofot 
a sample of 100 students. The plots show posterior means and associated posterior 
standard deviations (PSDs) calculated using point estimates of the item parameters, 
nonparametric ERFs, and fitted ERFs. In each case, the multiple imputation solution 
(Equations 12 and 13) is employed as a standard of evaluation, as it is nonparametric and 
accounts for dependencies among the parameters of different items. As can be seen, the 
various methods for handling uncertainty about Pj have had negligible effect on the 

calculation of posterior means. However, the effect on the associated PSDs is quite 
pronounced. As would be expected, the practice of using point estimates of item parameters 
as if they were known tme values seriously understates the uncertainty associated with 
examinees’ 9s. This effect is less pronounced when ERFs are used. Table 3 presents 
average PSDs calculated for the multiple imputation approach, the nonparametric and fitted 
ERFs, and the point estimates. In this example, the PSD of a typical examinee’s 9, when 
calculated using point estimates of the item parameters, was understated by abo.it 10%. 
This can be attributed to ignoring uncertainty about B altogether. For the nonparametric 
and fitted ERFs the understatement was only 3.6% and 3.9% respectively. This is 
obtained by incorporating uncertainty about B item by item, but ignoring dependencies 
across items. In terms of variance, about 60% of the typically-ignored variance was 
accounted for in this example through the use of ERFs. 
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[[Figure 3 about here]] 

[[Table 3 about here]] 

Conclusion 

As increasingly ambitious applications push item response theory closer to the 
boundaries of its applicabiliQr, increasingly strenuous efforts are required to deal with 
issues of uncertainty, both as to model fit and knowledge of parameters within the model. 
This paper addresses a problem of the latter type, namely, dealing with uncertainty about 
item parameters. Fortunately, statisticians’ recent interest in numerical and Bayesian 
approaches to such problems provide a variety of tools, each with their own strengths and 
weaknesses to be matched with the purposes and characteristics of applications. Expected 
response functions (ERFs) account for uncertainty that is usually ignored in a way that 
allow us to employ familiar formulas for known item response functions — even to apply 
the same formulas but with attenuated parameter estimates. This would be especially 
convenient in item-banking and adaptive-testing applications, in which tests are assembled 
from collections of pre-calibrated items. Uncertainty about item parameters (under the 
assumed model!) would be implicit in the parameter estimates available at a given point in 
time, no additional steps would be required at the point of calculating scores for individual 
examinees, and improved knowledge about item parameters would merely require updating 
a file of ERF parameters. 
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Table 1 

Statistics and Point Estimates of Item Parameters (a3»c) 
for 19 NAEP Mathematics Items 



Item 


% 

COlTCCt 


r-bis 


A 

a 


6 


A 

c 


1 


.78 


.35 


.39 


- 1.59 


.20 


2 


.92 


.63 


.90 


- 1.98 


.20 


3 


.78 


.45 


.55 


- 1.23 


.19 


4 


.85 


.45 


.77 


- 1.45 


.20 


5 


.79 


.45 


.63 


- 1.17 


.20 


6 


.91 


.47 


.54 


- 2.60 


.20 


7 


.65 


.65 


1.20 


-.26 


.17 


8 


.86 


.64 


.99 


- 1.37 


.18 


9 


.72 


.62 


1.22 


-.50 


.19 


10 


.67 


.61 


1.27 


-.26 


.20 


11 


.48 


.56 


1.96 


.53 


.23 


12 


.77 


.44 


.60 


- 1.06 


.20 


13 


.85 


.59 


.95 


- 1.30 


.19 


14 


.51 


.69 


1.89 


.20 


.15 


15 


.55 


.49 


.86 


.19 


.18 


16 


.43 


.41 


.65 


.81 


.16 


17 


.30 


.56 


1.10 


1.04 


.12 


18 


.53 


.44 


2.59 


.56 


.30 


19 


.21 


.63 


3.03 


1.09 


.10 
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Table 2 

Variances and Covariances of Item Parameter Estimates 
for 19 NAEP Mathematics Items 



Item 


Var ( a ) 


Cov ( a , b ) 


Var ( b ) 


Cov ( a , c ) 


Cov ( b , c ) 


Var ( c ) 


1 


.059 


.233 


1.212 


.001 


.027 


.008 


2 


.435 


.632 


1.086 


.003 


.012 


.008 


3 


.069 


.141 


.509 


.002 


.019 


.008 


4 


.264 


.320 


.536 


.004 


. 016 ‘‘ 


.008 


5 


.119 


.174 


.401 


.004 


.020 


.008 


6 


.078 


.325 


1.673 


.001 


.017 


.008 


7 


.300 


.055 


.073 


.011 


.010 


.006 


8 


.208 


.179 


.251 


.003 


.011 


.007 


9 


.259 


.094 


.108 


.010 


.012 


.007 


10 


.339 


.074 


.077 


.016 


.012 


.007 


11 


2.513 


.056 


.058 


.053 


.008 


.006 


12 


.114 


.181 


.527 


.004 


.021 


.008 


13 


.280 


.261 


.354 


.004 


.012 


.007 


14 


1.519 


.073 


.041 


.034 


.007 


.004 


15 


.203 


.037 


.132 


.011 


.015 


.007 


16 


.118 


-.043 


.201 


.009 


.014 


.006 


17 


.366 


-.075 


.104 


.012 


.005 


.003 


18 


10.944 


.232 


.058 


.129 


.009 


.008 


19 


11.626 


-.210 


.051 


.042 


.002 


.002 
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Table 3 

Average Posterior Variances and Standard Deviations 
for a Sample of 100 Examinees 



Estimation J .cthod 


Average 

Posterior 

Variance 


Average 

Posterior 

S.D. 


% 

Decrease 


Multiple Imputation 


0.2151 


.4585 


— 


Nonparametric ERF 


0.1995 


.4418 


3.6 


Fitted ERF 


0.1977 


.4406 


3.9 


Point Estimates 


0.1743 


.4113 


10.3 




‘) 
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Figure Captions 

Figure 1. 100 Draws from Item Parameter Posterior Distributions for Four Items. 
Figure 2. Item Response Functions for the Four Items. 

Figure 3. Scatterplots of Posterior Means and Standard Deviations for 100 Examinees. 
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100 Draws from Item Parameter Posterior Distributions for Four Items 
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Scatterplots of Posterior Means and Standard Deviations for 100 Examinees 
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Appendix A 

Pseudolikelihood Estimation of dfrom Marginalized Likelihoods 

The first section below paraphrases Arnold and Strauss’s (1991; denoted AS 
below) framework and results on pseudolikelihood estimation. The reader is referred to 
AS for regularity conditions, proofs, and examples. The second section shows how this 
framework accommodates likelihood estimation of 9 using the product of expected 
response curves. 

Pseudolikelihood Estimation 



Let (Xi,...,X/v) represent N iid n-dimensional observations with common joint 
density f(x;&) where 0 is an element of a p-dimensional parameter domain 0. Denote by S 
the set of all n-dimensional vectors consisting of O’s and 1 ’s, with at least one 1. For a 
particular 5 in S, the random vector contains the coordinates Xij of X/ for which 
Sj—1. For example, if X,- =(X,i, X/2. X/s) and 5=(1,0,1), then X/^)=(X,i, X/s). The 
density of X/(^) will be denoted although it may depend on only some of the 

components of 6. Let 5 = {5,; 5 € S} be a vector of 2"-l real numbers, not all zero, 
corresponding to the elements of S. The pseudolikelihood PL(5,0) of the data is defined 
by 



PL(s,e)=n 

X€S L 



(Al) 



Equivalently, in terms of logarithms, 

log PL(5,0) = £5.£log f,{A:,^^^;0). 

seS i=l 



A pseudolikelihood(5) estimate of 0 is a value of 0that maximizes (Al). Under 
regularity conditions, (Al) can be maximized by solving the pseudolikelihood equations, 
obtained by differentiating the log of the pseudolikelihood with respect to the elements of 0 
and setting them to zero; that is. 



^log PL(g.fl) = = 0 fork = l p. (A2) 
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If regularity conditions given in AS for f ind the f^’s are satisfied, then with 

A 

probability tending to 1 as N -» «<» the pseudolikelihood equation (A2) has a nx)t 0;^,such 
that 6fj — 6q, the true parameter value; i.e., the pseudolikelihood estimator is 
consistent. (The regularity conditions ensure, among other things, that the choice of S does 
not omit any elements of a multidimensional 6 from PL(5,6).) Moreover, the 
pseudolikelihood estimator is asymptotically normal. AS give an expression for its large- 
sample variance, which depends on the choice of oand is bounded from below by the 
large-sample variance of the MLE. In the univariate case, any consistent sequence 
= roots of (A2) satisfies 




■^N 



0 , 



K^(0) 

[mrr 



(A3) 



where 



K‘{6)= £5,5,E, 

s,s'^S 








and 



J ‘ (9) = -S E. log f .(AT"'; 9)|. 



Application to Expected Response Curves 

The above results can be applied to the estimation of examinee ability under an IRT 
model. Let X=(Xi, . . ., Xr) represent a response vector ftom an examinee to n items, 
governed by the IRT model F.{0) = P(X^. = l\e,pj) with 

P(X = iie.B) = n[F;(e)]‘' [i - F,.(9)]‘"' , 

;=i 



Let knowledge about B be expressed as p(B). The marginalized likelihood function for 
maximum likelihood estimation of d is 



' 



P(X = II 9) = J fI[F,(e)p [i - F,.(e)] 



p(B)(9B. 



Pseudolikelihood estimation 
Page A-3 



For pseudolikelihood estimation, define 5 as a selector for the subspace of S 
consisting of vectors that isolate a single item response; i.e., 



The pseudolikelihood PL(5,^ corresponding to one observed response vector (i.e., N=l) 
is obtained by specializing (Al) as follows: 



where F*(0) is the expected response curve for Item j. 

If knowledge about items is independent — i.e., p(B)=Ilp(j3y) — ^then the asymptotic 

variance of the pseudolikelihood estimate (A3) simplifies to the usual inverse of the sum 
risher information over items, as calculated with expected response curves. 

The AS consistency results imply the asymptotic equivalence of maximizing values 
of the full marginal likelihood, which does take dependencies among parameters from 
different items into account, and the product of the expected response curves, which does 
not, for large samples of response vectors for the same 0. Since we typically observe only 
one response vector per examinee in practical work, small-sample behavior remains to be 
examined. 






Appendix B 

Program Documentation 

This appendix provides detailed documentation for two computer programs: EXPRESFN 
and PLOURF. The EXPRESFN program computes EXPected RESponse FuNctions, both 
nonparametric and fitted, for a set of items, given a set of multivariate normal item parameter 
posterior distributions specified in terms of a set of mean vectors and an associated set of 
independent variance-covariance matrices. The PLOTIRF program provides plots of all 
estimated curves. 



The EXPRESFN Program 

The EXPRESFN program assumes that item responses may be modeled using a 2PL or a 
3PL IRT model. Both nonparametric and fitted expected response functions are estimated for 
all items. The procedures used to estimate the fitted expected response functions are very 
similar to the procedures employed in LCXjIST. The program also computes EAP ability 
estimates and standard errors for a set of examinees using the nonparametric and fitted 
expected response functions as well as the point estimates of the item parameter means. 

The program has the following options: 

1. The user may specify either a 2PL or a 3PL model. 

2. The input point estimates of the item parameter means and variance-covariance 
matrices may be specified on the (a,b,c) scale or on the transformed (log(a),b,logit(c)) scale. 

3. The range of the © grid and the total number of grid points may be specified. 

4. In computing the fitted expected response function, the weighting distribution may be 
either normal or rectangular and the sum of the weights, ie. the total number of pseudo- 
examinees, may be specified. 

5. In estimating the item parameters for the fitted expected response functions, the 
iterative procedure requires initial item parameter estimates. The program supplies default 
values for these initial estimates. However, the user may set all initial a’s to a given value, 
all initial c’s to a specified value or may supply the initial values. 

6. To control the problem of estimating c’s when the fitted expected response function 
becomes asymptotic below the minimum ability of interest, one may fix the c s at a common 
c for items where the estimated b-2/a is less than some critenon, fix all c’s at a common c, 
put a beta prior on the c’s and estimate the mean of the prior, or put a beta prior on Ae c s 
fixing the mean at a value specified by the user. The common c may be fixed or estimated. 

7. Abilities may be estimated for an existing set of item responses or for a set of 
responses generated by the program for a random sample of examinees drawn from either a 



o 
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normal or rectangular distribution. The generated data can be used to assess the differences 
between the abilities estimated using the three item response functions. 



Nonparametric Expected Response Function 

The nonparametric expected response function estimation procedure requires point 
estimates of the item parameters and associated variance-covariance matrices expressed on a 
transformed scale. If the input data has not already been transformed, then the following 
transformations will be applied: 



a‘j = log(aj) 

bi = bi 

C\ = 10g( C/(1-Cj)) 

var(a*p = varCap/Caja^) 
cov(a*j,b*j) = cov(aj,bp/3j 

cov(aj,c*j) = cov(aj,Cj)/(ajCj(l-Cj)) 

var(b*j) = var(bp 
cov(b*j,c*j) = cov(bj,Cj)/(Cj(l-Cj)) 
var(c*j) = var(cp/(Cj(l-Cj))^ 



A grid of M 0 values are specified from 0^ to Q^. Then a random sample of K par^eter 
values arc drawn from the multivariate normal distribution with means a*j, b*j, c*j and with the 
transformed variance-covariance matrix, If the point estimate of Cj is 0, the Cj is held 
fixed and only log a^ and bj sampled. The Cj for this item will also not be estimated for the 
fitted ERF. If the point estimate for Cj is less than or equal to .001, the mean for Cj used for 
the multivariate normal is set to the standard error of c. F*j(0^ is computed for each of the 
M values of 9 for each of the K IRF’s. F**j is the average of the F*j(0a)’s. 
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Fitted Expected Response Function 



The nonparametric ERF is the input for estimating the parameters of the fitted 
expected response function. The abilities are fixed at the values in the grid. A sample of 
pseudo-examinees is generated to weight the grid values according to a weighting distribution 
specified by the user. The distribution may be either normal or rectangular. If normal the 
user may specify the mean and standard deviation. The user specifies the number of 
examinees for the sample. Newton’s method is used to solve for the corrections to the 
estimated parameters by solving the likelihood equations. Since there are no omits, this 
procedure uses the expected values of the second derivatives which removes any possibilities 
of nonpositive definite matrices. If an item has a zero determinant, the item is removed firom 
further estimation and the parameters are set to the values before the zero determinant 



The iteration procedure requires initial values for the item parameters. The default 
value for a is one. The default value for c is l/(# choices) -.05. The default value for b is a 
function of the proportion correct The formulas to compute the default values of b are: 




where hj is given by the following equations 







and N is the number of pseudo-examinees. 

The procedure estimates the parameters for one item at a time until the relative change 
in a is less than .001 if a is being estimated. If a is fixed, the procedure iterates until the 
change in b is less than .001. One pass through all of the items constitutes a stage. In the 
first stage the c’s are held fixed. In the second and following stages the c’s are estimated 
unless a two parameter model is requested. If all c’s are being estimated, or there is a prior 
on the c’s, stages are repeated until the change in the likelihood is less than .02% between 
stages. 



If no prior is imposed on the c’s and the poorly estimated c’s are restricted to a 
common c value, the following procedure is used: 

In the second and third stages the c’s for all items are estimated. 
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At the end of the third stage, the c’s for items with b-2/a less than the criterion for 
rixing the c, (CRTTFIXC), arc fixed at a common c value. If all c’s are to be 
fixed at a common c value, they are set to the common c value at this point 

The common c value is then estimated once per stage until the change in the common 
c is less than the standard error of the common c estimate for two successive 
stages. Only the items with c fixed at the common c are estimated in these 
stages. 

The common c is then fixed and all items are again estimated until the criterion 
function increases by less than .02% 

If a prior on c is requested and the mean is estimated, the mean is computed as the 
average of the c’s at the end of each stage. Note: the beta prior is included in the 
computation of the likelihood and since the mean isn’t actually a maximum likelihood 
estimate of the mean, the likelihood may not increase uniformly. To prevent premature 
stopping of the estimation procedure in this situation, the procedure will continue until the 
maximum difference between IRP’s between stages is less than .001. The d’^erence is 
computed for 5 abilities from -2 to 2 at intervals of 1. 

The a parameter is restricted to a range of .01 to 99, c to a range of 0. to .99. The 
maximum amount that a parameter may change in any iteration is restricted. The amount for 
a is .1 times tlie previous value for a plus .2, b is .1 times previous value of b plus .4, and c 
is .06. 

Input 

The input to the program consists of a sysin file containing file names for the input 
and output files and parameters for controlling the procedure and a file containing the point 
estimates for the parameters and the variance-covariance of these estimates. If abilities are to 
be estimated for a group of examinees, the file of their responses is also read. 

The Svsin File. 



Record Set 1: 

The first set of records in the sysin file defiinc the input and output files. 

The set contains one record for each file to be defined. The last record in this set must be 
blank. The format for the file definition card is: 

col 1 F 

col 3 - 4 Unit number 

col 6-45 File name, with all qualifiers 
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The files to be specified are: 

Input files: 

Unit 5 File containing the sysin dataset 

Unit 10 File containing, for each item, the point estimates and the 

variance-covariance matrix. They may be either on the a,b,c 
scale or on the log a, b, logit c scale but both the point estimates 
and the variance-covariance matrix must be on the same scale. 

Unit 1 1 Input file containing the examinee responses if abilities are to be 
estimated for an existing item response file. 



Output files: 

Unit 6 Printed output file 

Unit 7 Item parameter output in LOGIST7 format The abilities written 

are the pseudo-abilities used to estimate the fitted ERF’s. 

Unit 12 Binary scratch output file, used to temporarily store the 
nonparametric ERF’s and then the examinee responses. 

Unit 13 OuQiut file containing the point estimate item parameters, the 
fitted ERF, and the nonparametric ERF for each item. 

Unit 14 OuQiut file containing the sample of item response functions, if 
it was requested that the sample be saved. 

Unit 15 Ouqiut file containing ability estimates, standard errors, and item 
responses, if abilities are estimated. 



Record Set 2. -r- j i u 

In record set 2, the options for running the procedure arc specified. Only those 

options where the default says "Required” must be spwified. The required parametCTs are the 
title, the number of items, the number of choices per item, and the format for reading the 
point estimates file. Defaults are supplied for all of the other parameters. The parameters are 
specified by entering the parameter name in positions 1 through 11 of the reconi and the 
value in positions 13 through 20. Formats are entered in positions 13 - 80. Right justify all 
integer values. The last record in this set must be blank. 



Parameter input 



Parameter 


Description / Options 


Default 


TITLE 


Title for the run 


required 


#ITEMS 


Number of items. (Maximum 800) 


required 


SEED 


Random number seed. Integer between 0 and 
1048576. 


275927 


DEBUG 


Debugging printout? 


NO 


ITEMIDEN 


Read in 8-character item identification codes? 


NO 
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Parameter Description / Options 



Default 



GENFDCC 

IFTRANS 

FMTVAR 



#SAMPIRF 

MINTHETA 



Is c fixed in var/cov i.e. var/cov for c are 0? If so c NO 
will be fixed in fitting the ERF. 

Are the input point estimates and var/cov matrix on NO 
the log a, b, logit c scale? 

Fonnat for reading point estimates and var/cov Required 

matrix. The values are read in following order: 

item number, a, b, c, var(a), cov(a,b), cov(a,c), 

var(b), cov(b,c), var(c). If abilities are to be 

estimated for a group of examinees, the item 

number must be the sequence number of the item in 

the record of item responses. 

Number of item parameter values to sample 100 

(Maximum=l,000) 

Minimum ability for 0 grid -3. 



MAXTHETA Maximum ability for 0 grid 



3. 



#ABILGRP 


Number of points in 0 grid. (Maximura 201) 


31 


WEIGHTFN 


Weighting distribution for fitting ERF. Enter 
RECTANGULAR or NORMAL 


NORMAL 


WEIGHEMN 


If weighting distribution NORMAL, specify mean 


0. 


WEIGHTSD 


If weighting distribution NORMAL, specify 
standard deviation. 


1. 


#ERFEXAM 


Number of pseudo examinees for estimating the 
fitted ERF’S. These will be apportioned by the 
weighting distribution to the M 0 grid points and 
adjusted so that there is an integral number of 
examinees at each grid point 


3100 


SAVESAMP 


Save the sample of item response functions to a 
file? 


NO 


READA 


Read in initial a’s? 


NO 


READB 


Read in initial b’s? 


NO 


READC 


Read in initial c’s? 


NO 
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Parameter 

PRIORC 



CRTTFIXC 

AINIT 

AMAX 

PARMCODE 



Description / Options Default 

prior on c? 0 

0 - no, estimate all c’s, don’t fix any at the common 
c value. 

1 - no, fix c’s at a common c (CX)MCx) if 

b-2/a<CRITFlXC. Estimate COMCx. 

2 - no, fix all items at a common c. Estimate 
COMCx. 

3 - yes, estimate the mean of prior. 

4 - yes, fix the mean of prior. 



Criterion for fixing c, if no prior requested and -2.5 

PRIORC = 1. 

Initial a value, if READA is NO. 1. 

Maximum a. 99.0 

What parameters are to be estimated 3 

-1 - read in parmcode for each item 



Otherwise set parameter code for all items to the 
specified code. The definitions of the codes are: 



code parameters 
estimated 

2 a,b 

3 a,b,c 



CHOICESx 



ClNITx 

COMCx 



Number of choices per item, x indicates a sequence Required 
number for different item t 5 q>es. Specify a different 
CHOICESx for each item type. For example, if a 
test has 4 and 5 choice items, set CHOICES 1 to 4 
and CHOICES2 ta 5. x must be between 0 and 98. 



Initial c for the CHOICESx items. 1/CHOICESx -.05 

If no prior on c , common c value for the 1/CHOICESx -.05 

CHOICESx items. 

If prior on c, mean c of prior for the CHOICESx 
items. 
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Parameter 

N-INFx 



CHIx 

ESTABIL 

#EXAMINEE 

PRIORMN 

PRIORSD 

GENRESP 

DISTABIL 

DISTMN 

DISTSD 

RECTMIN 

RECTMAX 

FMTRESP 



Description / Options 

This is only used if there is a prior on c. It is the 
weight for the prior on c in terms of the number in 
a hypothetical grotq> of examinees at minus infinity. 
It controls the variance of the beta prior. A 
separate N-lNFx must be specified for every 
CHOICESx alternatives. 

Maximum c 

Estimate abilities? 

Number of examinees for which abilities arc to be 
estimated if ESTABIL=YES. (Maximum 10,000) 

Prior mean of p(0) 

Prior standard deviation of p(0) 

Generate artificial data, abilities and item responses. 

If generating artificial data, specify type of ability 
distribution to generate, either ’RECTANGULAR’ 
or ’NORMAL’. 

If DISTABIL is ’NORMAL’, specify the mean of 
the distribution. 

If DISTABIL is ’NORMAL’, specify the standard 
deviation of the distribution. 

If DISTABIL is ’RECTANGULAR’, specify 
minimum ability for distribution. 

If DISTABIL is ’RECTANGULAR’, specify 
maximum ability for distribution. 

If reading in examinee responses, specify format for 
reading the item responses. They will be selected 
as specified by item number read from the point 
estimates file. They are read in integer format As 
many integer fields must be specified as the 
maximum item number read the point 
estimates. For example, if the item numbers read 
fit>m the point estimates are 1,5, and 10. The 
format must specify reading in 10 integer fields. 



Default 

20 



.99 

NO 

20 

0 . 

1 . 

YES 

RECTANGULAR 

0 . 

1 . 

-3. 

3. 

Required if 

ESTABIL=YES 

and 

GENRESP=NO. 
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Additional input: 

If PARMCODE = -1, read in a parameter code for each item with Record set 3. 

If more than one CHOICESx read, specify the items for each number of choices in 
Record set 4. 

If ITEMIDEN requested, read in item identiEcation in Record set 5. 

Record set 3. 

This record set is only required if PARMCODE is set to -1 to read in a parameter 
code for each item. 

col 1 - 8 "PARMCODE" 

col 9-10 Sequence number for this PARMCODE record. 

col 1 1-80 Parameter codes for the items in 3512 format 

Repeat for as many records as necessary, increasing the sequence number for each 

record. For example, for items 36-40, the sequence number must be 2. 

Record set 4. 

This record set is only necessary if more than one CHOICESx is specified. It is used 
to specify the number of choices for each item. 

col 1 - 8 "CHOICESx" where x corresponds to the CHOICESx specified on the 
parameter records. 

col 9 -10 Sequence number for this CHOICESx record. 

col 1 1 - 80 Item numbers of the items, that have the number of choices specified by 
CHOICESx, read in (1015) format A sequence of items can be 
specified by specifying the first number in the sequence followed by 
the negative of the last number in the sequence. 

Enter as many CHOIC^ESx records as necessary, increasing the sequence num^r for 
each record. Do no split a sequence across two records. If the beginning of a 
sequence would be the last field of a record, leave the last field blank and start 
the sequence on the next record. 



Record set 5. 

If ITEMIDEN is "YES", this set is required to read in the 8-character item 
identification for each item, 
col 1 - 8 ’TTEMIDEN" 

col 9 - 10 Sequence number 

col 11 - 18 Item identification for the first item. Left justify the identification in the 



field. 

col 19 - 10 Blank 

col 21 - 28 Item identification for the second item, 
col 29 - 30 Blank 
etc. etc. 

Enter 7 item identifications per record, repeat for as many records as necessary, 
increasing the sequence number for each record. For example, record with 
sequence number 2 will contain the identifications for items 8 through 14. 
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Detailed description of output: 

Unit 6 Printed output file 
The printout contains: 

Check on input parameters and defaults. 

For the nonparametric ERF, the point estimates, the input var/cov matrix, the 
var/cov for the sampled IRF’s for both the a,b,c scale and the 
transformed scale, and the nonparametric ERF for a spaced sample of 
the 0 grid points are printed. 

For the estimation of the parameters for the fitted ERF, the likelihood is 

printed for each stage as well as the maximum derivatives for the three 
parameters, the maximum change in an iteration, and the maximum 
change over aU iterations for each type of parameter. If the common c 
is being computed, information on &e computation of the common c 
values is printed. 

For each item there is a parameter code that indicates which item parameters 
are being estimated. The values for the codes are defined in the input 
description. In addition, a 20 is added to the code if the c for an item 
is held fixed at the common c. If an item is removed because the 
expected matrix of second derivatives had a zero determinant, the 
parameter code is set to 996. 

The final item parameter estimates are printed as well as the standard errors of 
the estimates. 

If abilities are estimated, the EAP ability estimates and the standard errors are 
printed for the point estimate IRF, the nonparametric ERF and the fitted 
ERF. Only the first and last 10 are printed. 

Unit 7 Item parameter output in LOGIST7 format The abilities written are the 

pseudo-abilities used to estimate the fitted ERF’s. A subroutine to read this 
file is included with the program. The subroutine contains comment statements 
that describe the calling arguments. OuQ)ut includes the title, the number of 
items, the number of pseudo-examinees, the estimated item parameters, the 
pseudo-abilities, variables used in the estimation of c, and parameter code 
indicator for number of parameters estimated. 

Unit 13 File containing the nonparametric item response functions for plotting with the 
plot program. The first record contains the title of the run. TTie second record 
contains the number of items (15). The third record contains the M abilities for 
the 0 grid in the format (5X,10F8.4). The remaining records contain the item 
sequence number, the item number, the item identification, the a,b,c point 
estimates, a,b,c estimates for the fitted ERF, the parameter code,and the 
nonparametric proportion correct for the M abilities in the format 
(2I5,A8,1X3F12.6,1X,3F12.644/(10F12.6)) 
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Unit 14 Output flic containing the sample of item response functions, if it was 

re<]uested that it be saved. For each item, the item number and the three 
parameters for each sampled IRF are written in the format 
(I4,12F12.WX,12F12.6)). 

Record 1: col 1-4: Item number 

col S - 16: a for first item sampled 
col 17 -28: b for first item san^led 
col 29 - 40: c for first item sampled 
col 41 - 52: a for second item sampled 
etc. etc. 

Unit 15 Output file containing ability estimates and standard errors, and item responses, 
if abilities are estimated. 

For each examinee a record is written in the format (I5,7F12.6, 60011) 
containing: 

col 1-5: examinee sequence number 

col 6 - 17 - true ability, (if responses are read, this is set to 999999.) 
col 18 - 29 - EAP ability computed using point estimate IRF 
col 30 - 41 - EAP ability computed using fitted ERF 
col 42 - 53 - EAP ability computed using nonparametric ERF 
col 54 - 65 - Standard error of ability computed using point estimate 
IRF 

col 66 - 77 - Standard error of ability computed using the fitted EI^ 
col 78 - 89 - Standard error of ability computed using nonparametric 
ERF 

col 90 + Item responses in II format, items 1 to #ITEMS. 



The PLOTIRF Program 

A plot program was also developed that plots the three item response functions^ for 
comparison of the three curves. This program produces plots on the screen, a laser printer, or 
a postscript printer. Input to the program consists of a sysin file with the control parameters 
and the file written on the unit 13 by the EXPRESFN program. One, four or eight plots per 
page are possible. 

Input 

The sysin fil e consists of a set of records defining the input and output files and a few 
control parameters. 

Record set defining files. 

The set contains one record for each file to be defined. 

The last record in this set must be blank. 
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The format for the file definition card is: 

col 1 F 

col 3 - 4 Unit number 

col 6-45 File name, with all qualifiers 

The files to be specified are; 



Input files; 

Unit S Sysin file containing file definitions and parameters. 

Unit 13 File written on unit 13 in EXPRESFN containing the nonparametric 

item response functions. 

Output file: 

Unit 9 Plot ouq>ut if requested that the plots be saved for printing later. 



Record set specifying control parameters. 

The last record in this set must be a blank record. 



Parameter 

TITLE 

IFSELIT 

PLOTDEV 



#PLOTPAGE 



Description/options Default 

Title for plots. Title from 

EXPRESFN. 

Select items from items in NO 

EXPRESFN run. 

Plotting device: LASER 

POSTSCRIPT 
LASER - HP laser printer 
SCREK^ - only display on screen. 

Number of plots per page. Options are 8 
1, 4, or 8. 



SAVEPLOT Plot now or write plots to file? NO 

NO - print plots now 
YES - save plots to a file for 
printing later. 

Record set 3. 

If IFSELIT is YES to select items from the EXPRESFN run, specify the items to 
select with this record set 



o 
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The format of record set 3 is as follows: 



col 1 - 8 "IFSELIT’ 

col 9 -10 Sequence number for this IFSELIT record, 
col 11 - 15 Item number of first item to be selected, 

col 16 - 20 Item number of second item to be selected, 

etc. etc. 

col 76 - 80 Item number of 14th item to be selected. 

Indicate a sequence of item numbers by entering the first in the sequence and the negative of 
the last in the sequence. Repeat for as many cards as necessary. Increase the sequence 
number for each card. Do not split a sequence across two records. If the beginning of a 
sequence would be the last field of a record, leave the last field blank and start the sequence 

on the next record. 
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